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Abstract 

To quantify an operational risk capital charge under Basel II, many banks adopt a 
Loss Distribution Approach. Under this approach, quantification of the frequency 
and severity distributions of operational risk involves the bank's internal data, expert 
opinions and relevant external data. In this paper we suggest a new approach, 
based on a Bayesian inference method, that allows for a combination of these three 
sources of information to estimate the parameters of the risk frequency and severity 
distributions. 
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1 Introduction 

To meet the Basel II requirements, BIS [6j, many banks adopt a Loss Distribution Ap- 
proach (LDA). Under this approach, banks quantify distributions for the frequency and 
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severity of operational losses for each risk cell over a one year time horizon; see, e.g., Cruz 
[7], McNeil et al. [19], Panjer [22] • Banks can use their own risk cell structure but they 
must be able to map the losses to the relevant Basel II risk cells (eight business Hnes times 
seven risk types). The commonly used LDA model for an annual loss in a single risk cell 
is the sum of individual losses 

AT 

i-E^fc' (1-1) 

fc=l 

where N is the annual number of events (frequency) and Xk, k — 1, . . . , N , are the sever- 
ities of these events. 

Several studies, e.g., Moscadelli [20] and Dutta and Perry [Tl], analyzed operational risk 
data collected over many banks by Basel II business Hne and event type; see Degen et 
al. [10] for a discussion and analysis of these studies. While analyses of collective data 
may provide a picture for the whole banking industry, estimation of frequency and sever- 
ity distributions of operational risks for each risk cell is a challenging task for a single 
bank. The bank's internal data are typically collected over several years. On the one 
hand, there might be some cells with few internal data only. On the other hand, industry 
data available through external databases (from vendors and consortia of banks) are often 
difficult to adapt to internal processes, due to different volumes, thresholds etc. 
Therefore, it is important to have expert judgments incorporated into the model. These 
judgments may provide valuable information for forecasting and decision making, espe- 
cially for risk cells lacking internal loss data. In the past, quantification of operational risk 
was based on such expert judgments only. A quantitative assessment of risk frequency 
and severity distributions can be obtained from expert opinions; see, e.g., Alderweireld 
et al. [2]. By itself, this assessment is very subjective and should be combined with (sup- 
ported by) the analysis of actual loss data. In practice, due to the absence of a sound 
mathematical framework, ad-hoc procedures are often used to combine the three sources 
of data: internal observations, external data and expert opinions. For example, the fre- 
quency distribution is estimated using internal data only, while the severity distribution 
is fitted to a sample combining internal and external data. 

On several occasions, risk executives have emphasized that one of the main challenges in 
operational risk management is to combine internal data and expert opinion with relevant 
external data in an appropriate way; see, e.g., Davis [9], an interview with four industry's 
top risk executives in September 2006: "[A] big challenge for us is how to mix the internal 
data with external data; this is something that is still a big problem because I don't think 
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anybody has a solution for that at the moment. " Or: "What can we do when we don't have 
enough data [. . .] How do I use a small amount of data when I can have external data 
with scenario generation? [. . .] I think it is one of the big challenges for operational risk 
managers at the moment". 

A "Toy" model, based on hierarchical credibility theory, was proposed by Biihlmann et 
al. [5] for low frequency high impact operational risk losses exceeding some high thresh- 
old. However, this model can be too sensitive to expert opinions used to estimate scaHng 
factors for distribution parameters. In the present framework we introduce a model that 
is more robust towards expert opinions. 

We use Bayesian inference as the statistical technique to incorporate expert opinions into 
data analysis. There is a broad literature covering Bayesian inference and its applications 
to the insurance industry and other areas. The method allows for structural modeling 
of different sources of information. Shevchenko and Wiithrich [24] described the use of 
the Bayesian inference approach, in the context of operational risk, for estimation of fre- 
quency/severity distributions in a risk cell, where expert opinion or external data are 
used to estimate prior distributions. This allows the combining of two data sources: ei- 
ther expert opinion and internal data or external data and internal data. 
The novelty in this paper is that we develop a Bayesian inference model that allows for 
combining three sources (internal data, external data and expert opinions) simultaneously. 
To the best of our knowledge, we have not seen any similar model that copes compre- 
hensively with this task. Moreover, one should note that our framework enlarges the 
classical Bayesian inference models belonging to the exponential dispersion family with 
its associated conjugates; see, e.g., Biihlmann and Gisler [4j, Chapter 2. 
In Section [2] we develop a suitable method to combine the three types of knowledge in the 
context of operational risk. In Sections 3 and 4, this framework is used to quantify loss 
frequency and severity, respectively. Several examples illustrate the quality and the ro- 
bustness of this quantitative approach for operational risk. In Section 5 we briefly discuss 
open challenges when aggregating risk cells and estimating risk capital. 

2 Bayesian Inference 

In order to estimate the risk capital of a bank and to fulflll the Basel II requirements, risk 
managers have to take into account information beyond the (often rare) internal data. 
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This includes relevant external data (industry data) and expert opinions. The aim of this 
section is to provide some well-founded background to combining these three sources of 
information. Hereafter we consider one risk cell only. 

In any risk cell, we model the loss frequency and the loss severity by a distribution (e.g., 
Poisson for the frequency or Pareto, lognormal etc. for the severity). For the considered 
bank, the unknown parameters 70 (e.g., the Poisson parameter or the Pareto tail index) 
of these distributions have to be quantified. 

A priori, before we have any company specific information, only industry data are avail- 
able. Hence, the best prediction of our bank specific parameter 70 is given by the belief 
in the available external knowledge such as the provided industry data. This unknown 
parameter of interest is modeled by a prior distribution (also called structural distribution 
or risk profile) corresponding to a random vector 7. The parameters of the prior distri- 
bution (so-called hyper-parameters) are estimated using data from the whole industry by, 
e.g., maximum Hkelihood estimation, as described in Shevchenko and Wiithrich [21]. If 
no industry data are available, the prior distribution could come from a "super expert" 
that has an overview over all banks. 

In our terminology, we treat the true company specific parameter 70 as a realization of 
7. The random vector 7 plays the role of the underlying parameter set of the whole 
banking industry sector, whereas 70 stands for the unknown underlying parameter set of 
the bank being considered. Note that 7 is random with known distribution, whereas 70 is 
deterministic but unknown. Due to the variability amongst banks, it is natural to model 
7 by a probability distribution. 

As time passes, internal observations X = {Xi,...,Xk) as well as expert opinions 
1? = {i9'-^\ . . . ,1?'*^') about the underlying parameter 70 become available. This affects 
our belief in the distribution of 7 coming from external data only and adjust the predic- 
tion of 7o. The more information on X and 1? we have, the better we are able to predict 
7o. That is, we replace the prior density 7r(7) by a conditional density of 7 given X and 
1?. 

The natural question that arises at this point is: How does this company specific informa- 
tion X and i9 change our view of the underlying parameter 7, i.e., what is the distribution 
of 7|X,i9? 

The Bayesian inference approach yields the canonical theory answering questions of the 
above type. In order to determine 'y\X, 1? we have to introduce some notation. The joint 
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conditional density of observations and expert opinions given the parameter vector 7 is 
denoted by 

MX,i5|7)=/^i(X|7)/i2(i?l7), (2.1) 

where hi and /i2 are the conditional densities (given 7) of X and 1?, respectively. Thus 
X and 1? are assumed to be conditionally independent given 7. 

Remarks 2.1: 

• Notice that, in this way, we naturally combine external data 7 with internal data 
X and expert opinion i9. 

• In classical Bayesian inference (as it is used, e.g., in actuarial science), one usu- 
ally combines only two sources of information. The novelty in this paper is that 
we combine three sources simultaneously using an appropriate structure, i.e., equa- 
tion 1(2^1) . 

• (|2.ip is quite a reasonable assumption: Assume that the true bank specific parameter 
is 7o. Then l|2.ip says that the experts in this bank estimate 70 (by their opinion 1?) 
independently of the internal observations. This makes sense if the experts specify 
their opinions regardless of the data observed. □ 

We further assume that observations as well as expert opinions are conditionally indepen- 
dent and identically distributed (i.i.d.), given 7, so that 

K 

/ii(X|7) = llh{X,\^), (2.2) 

fe=i 

M 

h2{^h) = n /2(^9'™'l7), (2.3) 

m—l 

where /i and /2 are the marginal densities of a single observation and a single expert opin- 
ion, respectively. We have assumed that all expert opinions are identically distributed, 
but this can be generalized easily to expert opinions having different distributions. 
The unconditional parameter density 77(7) is called the prior density, whereas the con- 
ditional parameter density Tr{-f\X,'d) is called the posterior density. Let h{X,'d) denote 
the unconditional joint density of observations X and expert opinions "d. Then it follows 
from B ayes' Theorem that 

h{X, i?|7)^(7) = ^h\X, ^)h{X, (2.4) 
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Note that the unconditional density h{X, i9) does not depend on 7 and, thus, the posterior 
density is given by 

K M 

7f(7|X,i9) (X 7r(7) n hiXkh) [] /2(^^"^l7), (2-5) 

fe=l m=l 

where "oc" stands for "is proportional to" with the constant of proportionality independent 
of the parameter vector 7. For the purposes of operational risk it is used to estimate the 
full predictive distribution of future losses. 

Equation l|2.5p can be used in a general set-up, but it is convenient to find some conjugate 
prior distributions such that the prior and the posterior distribution have a similar type, 
or where, at least, the posterior distribution can be calculated analytically. 

Definition 2.2 (Conjugate Prior Distribution) Let denote the class of density 
functions h{X,'d\^), indexed by 7. A class U of prior densities Tr{j) is said to be a 
conjugate family for F if the posterior density 7f(7|X,i9) cx 7r(7)/i(X, i5|7) also belongs 
to the class U for all h £ F and n £ U. □ 

Conjugate distributions are very useful in practice and will be used consistently through- 
out this paper. At this point, we also refer to Biihlmann and Gisler Section 2.5. In 
general, the posterior distribution cannot be calculated analytically but can be estimated 
numerically for instance by the Markov Chain Monte Carlo method; see, e.g., Peters and 
Sisson |23j or Gilks et al. [17]. 

3 Loss Frequency 

3.1 Combining internal data and expert opinions with external 
data 

Model Assumptions 3.1 (Poisson-Gamma-Gamma) 

Assume that bank i has a scaling factor Vi, I < i < I, for the frequency in a specified 
risk cell (e.g., it can be a product of economic indicators such as the gross income, the 
number of transactions, the number of staff, etc.). We choose the following model for the 
loss frequency for operational risk of a risk cell in bank i: 

a) Let Ai ~ r(ao, /3o) be a Gamma distributed random variable with shape parameter 
Qfo > and scale parameter /3o > 0, which are estimated from (external) market 



6 



data. That is, the density of r(ao,/3o), x°'°-^e-''^^° /{Po°T{ao)) (x > 0), plays the 
role of 7r(7) in (|2?5l) . 

b) The number of losses of bank i in year k, 1 < k < Ki, are assumed to be conditionally 
i.i.d., given A^, Poisson distributed with frequency ViAi, i.e., Ni^i, . . . , Ni^Kil^i ' 
Pois(l^Ai). That is, /i(-|Ai) in (|2.5p corresponds to the density of a Pois(ViAi) 
distribution. 

c) We assume that bank i has Mi experts with opinions 'd'f^\ 1 < m < Mi, about 
the company specific intensity parameter A^ with jA^ r(^i, where ^i 
is a known parameter. That is, /2(-|Ai) corresponds to the density of a T{^i, ^) 
distribution. □ 



Remarks 3.2: 

• In the sequel, we only look at a single bank i and therefore we could drop the index 
i. However, we refrain from doing so in order to highlight the fact that we do not 
consider the whole banking industry, but only a single bank. 

• The parameters ao and /3o in Model Assumptions 13 . II a) are called hyper-parameters 
(parameters for parameters); see, e.g., Biihlmann and Gisler |4j, p. 38. These pa- 
rameters are estimated using the maximum likelihood method or the method of mo- 
ments; see for instance Shevchenko and Wiithrich [21], Section 5 and Appendix B. 

• In Model Assumptions 13.11 c) we assume 

^4"''^\A,]= 1<TO<M„ (3.1) 

that is, expert opinions are unbiased. A possible bias might only be recognized by 
the regulator, as he alone has the overview of the whole market. □ 

Note that the coefficient of variation of the conditional expert opinion -i?!™^ | A^ of company 
i is Vco(i?|'"^|A,) = (var(z9|"V»))^/VE[i?|'"^|A,] = l/VTt, and thus is independent of 
Ai. This means that £,i, which characterizes the uncertainty in the expert opinions, is 
independent of the true bank specific A^. For simplicity, we have assumed that all experts 
have the same conditional coefficient of variation and thus have the same credibility. 
Moreover, this allows for the estimation of within each company i, e.g., by £,i — ['jli/di)'^ 
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with 



1 



T?!"^ and af 



1 



(3.2) 




m—1 



m—1 



In a more general framework the parameter can be estimated, e.g., by maximum Hkeh- 
hood. If the credibility differs among the experts, then -i?!™-* and Vco(i?|™'' |Ai) should be 
estimated for all m, 1 < m < M^. This may often be a (too) challenging issue in practice. 

Remarks 3.3: 

• Ai is the risk characteristic of a risk cell in bank i. A priori, before we have any 
observations, the banks are all the same, i.e., is i.i.d. Observations and expert 
opinions modify this characteristic according to the actual experience in company 
i, which gives different posteriors Ai\Ni^i, . . . , Ni^Ki,'&i^\ ■ ■ ■ I'^i^^''' ■ 

• This model can be extended to a model where one allows for more flexibility in 
the expert opinions. For convenience, we prefer that experts are conditionally i.i.d., 
given Ai. This has the advantage that there is only one parameter, ^i, that needs 
to be estimated. □ 

Using the notation from Section [21 we calculate the posterior density of A^ , given the 
losses up to year Ki and the expert opinion of Mi experts. We introduce the following 
notation for the loss database and the expert knowledge of bank i: 



The posterior density tt is given by the following theorem. 
Theorem 3.4 

Under Model A ssumptions \3.1[ the posterior density of Ai, given loss information Ni and 
expert opinion "di, is given by 



N, = (7V,,i,...,7V,,kJ, 



Here and in what follows, we denote arithmetic means by 




etc. 



(3.3) 



U —\iUJ—\^ 4> 



(3.4) 
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with 



V = ao - 1 - M,i, + lUNi, 

oj = V,K, + ^, (3.5) 
Po 



1 

Ku+i{z) = X / u'^e-^^^+i/^^/^dM. (3.6) 
2 Jo 

K^{z) is called a modified Bessel function of the third kind; see for instance Abramowitz 
and Stegun P, p. 375. 

Proof: Set ai = and Pi = A^/^i. Model Assumptions 13.11 applied to l|2.5p yield 

^^i,k- Pi 



k=l 

Ki Mi 



k—l ni—1 
.ao-l-Miii+KiNi ( - 1 \ 1 



OC A,r-^-^"'^'^"-"' exp ( -A. ( V,K, + ^ j ^ ^^^^h^.,^^ . (3.7) 



□ 



Remarks 3.5: 

• A distribution with density l|3.4p is referred to as the generalized inverse Gaussian 
distribution GIG(a;, v). This is a well-known distribution with many applications 
in finance and risk management; see McNeil et al. [19]. The GIG has been analyzed 
by many authors. A discussion is found, e.g., in J0rgensen [18]. The GIG belongs to 
the popular class of subexponential distributions; see Embrechts p2j for a proof and 
Embrechts et al. [13] for a detailed treatment of subexponential distributions. The 
GIG with < 1 is a first hitting time distribution for certain time-homogeneous 
processes; see for instance J0rgensen flSj, Chapter 6. In particular, the (standard) 
inverse Gaussian (i.e., the GIG with v — —3/2) is known by financial practitioners as 
the distribution function determined by the first passage time of a Brownian motion. 
Algorithms for generating realizations from a GIG are provided by Atkinson and 
Dagpunar [8]; see also McNeil et al. [l9j and Appendix \K\ below. 

• Unlike in the classical Poisson-Gamma case of combining two sources of information 
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(see Shevchenko and Wiithrich [2l], Biihlmann and Gisler U), we obtain in p.7p a 
more complicated posterior distribution tt, which involves in the exponent both 
and l/Xi- Note that expert opinions enter via the term 1/Xi only. We give some 
basic properties of the GIG distribution below. 

• Observe that the classical exponential dispersion family (EDF) with associated con- 
jugates (see Biihlmann and Gisler [i], Chapter 2.5) allows for a natural extension 
to GIG-Hke distributions. In this sense the GIG distributions enlarge the classical 
Bayesian inference theory on the exponential dispersion family. □ 

For our purposes it is interesting to observe how the posterior density transforms when 
new data from a newly observed year arrive. Let Vk, oJk and 4>k denote the parameters for 
the observations (A^i,i, . . . , Ni^k) after k accounting years. Implementation of the update 
processes is then given by the following equalities (assuming that expert opinions do not 
change) . 

Information update process. Year k year fc + 1: 

ujk+i = uJk + Vi, (3.8) 
4>k+i = 4>k- 

Obviously, the information update process has a very simple form and only the parameter 
V is affected by the new observation Ni^k+i- The posterior density l|3.7p does not change 
its type every time new data arrive and hence, is easily calculated. 

The moments of a GIG cannot be given in a closed form by elementary functions. However, 
for a > 1, all moments are given in terms of Bessel functions: 



\^ J Ky+i{2^/uj(p) 



A useful notation is the following: 



RA=) = (3.10, 



Then it follows for the posterior expected number of losses 



^[K\Ni,^i] = \I^R,+i{2^uj4>), (3.11) 
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and for the higher moments 

E[A^|Ari,,9i] = ( \{R,+k{2^), a = 2,3,... (3.12) 

fc=i 

We are clearly interested in robust prediction of the bank specific Poisson parameter and 
thus the Bayesian estimator i|3.1ip is a promising candidate within this operational risk 
framework. The examples below show that, in practice, p. lip outperforms other classical 

estimators. To interpret l|3.1ip in more detail, we make use of asymptotic properties. 

fix) 

Here and throughout the paper, fix) g(x), x ^ a, means that lim = 1. Lemma 

x^a g(x) 

IB. II in Appendix [B] basically says that Ry2{2v) ~ i/ is asymptotically linear for ^ oo. 
This is the key in the proof of Theorem 13.61 and yields a full asymptotic interpretation of 
the Bayesian estimator l|3.1ip . 

Theorem 3.6 

Under Model Assumptions \3.1\ the following asymptotic relations hold V -almost surely: 

a) Assume, given A^ = Ai, Ni^k Pois{Vi\i) and d'f^^ r(^i, Ai/^^). 
For X, ^ oo : ¥.[K\Ni,'&i] ^ E[iV,,fc|A, = X,]/V^ = A,. 

h) For Vcoli^l^VO ^ : E[A,| AT,, -^^J ^ m = 1, . . . , M,. 

c) Assume, given A^ — Ai, Ni^k Pois{Vi\i) and d'f^^ S,i) ■ 
For M, oo : E[A,|Ari, ^ E[z9|"V» - M = 

d) For Vco(?9|"V«) ^ oo, m = 1, . . . , : 

e) For E[Aj] = constant and Vco(Ai) -> : E[Ai|Ari, 1?^] ^ E[A,]. 

Proof: See Appendix O □ 

Theorem l3. 61 yields a natural interpretation of the posterior density p.4p and its expected 
value (|3.1ip . As the number of observations increases, we give more weight to them and 
in the limit Ki ^ oo (case a) we completely believe in the observations Ni^k and we 
neglect a priori information and expert opinion. On the other hand, the more the co- 
efficient of variation of the expert opinions decreases, the more weight is given to them 
(case b). In Model we assume experts to be conditionally independent. In practice, 
however, even for Vco(t?["^|A,) ^ 0, the variance of i}i\Ai cannot be made arbitrarily 
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small when increasing the number of experts, as there is always a positive covariance 
term due to positive dependence between experts. Since we predict random variables, we 
never have "perfect diversification", that is, in practical applications we would probably 
question property c. 

Conversely, if experts become less credible in terms of having an increasing coefficient of 
variation, our model behaves as if the experts do not exist (case d). The Bayes estimator 
is then a weighted sum of prior and posterior information with appropriate credibil- 
ity weights. This is the classical credibility result obtained from Bayesian inference on 
the exponential dispersion family with two sources of information; see Shevchenko and 
Wiithrich [21], Formula (12). 

Of course, if the coefficient of variation of the prior distribution (i.e., of the whole banking 
industry) vanishes, the external data are not affected by internal data and expert opinion 
(case e). 

In this sense. Theorem 13.61 shows that our model behaves exactly as we would expect and 
require in practice. Thus, we have good reasons to believe that it provides an adequate 
model to combine internal observations with relevant external data and expert opinions, 
as required by many risk managers. 

Note that one can even go further and generaHze the results from this section in a natural 
way to a Poisson-Gamma-GIG model, i.e., where the prior distribution is a GIG. Then 
the posterior distribution is again a GIG (see also Model Assumptions 14.61 below) . 

3.2 Implementation and practical application 

In this section we apply the above theory to a concrete example. The Bayesian estimator 
l|3.1ip derived above is easily implemented in practice. The following example extends 
the example displayed in Figure 1 in Shevchenko and Wiithrich [24] . 

Example 3.7 Assume that external data (e.g., provided by external databases or regu- 
lator) estimate the parameter of the loss frequency (i.e., the Poisson parameter A) which 
has a Gamma distribution A ^ r(ao,/3o) as E[A] = ao/3o ~ 0.5 and P[0.25 < A < 
0.75] = 2/3. Then, the parameters of the prior Gamma distribution are ao ~ 3.407 and 
/3o ~ 0.147; see Shevchenko and Wiithrich [2jy, Section 4.1. 



Now, we consider one particular bank i: 

i) One expert says that d is estimated by = 0.7. For simplicity, we consider 
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in this example one single expert only and hence, the coefficient of variation is 
not estimated using l|3.2p . but given a priori, e.g., by the regulator: Vco('!9|A) = 
(var(i?|A))i/2/E[i?|A] 0.5, i.e., C = 4. 

ii) The observations of the annual number of losses are given as follows (sampled from a 
Poisson distribution with parameter A ~ 0.6; this is the dataset used in Shevchenko 
and Wiithrich [24]): 

Year i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
0000101110 2 1 1 2 

This means that a priori we have a frequency parameter distributed as r(ao, /3o) with mean 
ao/?o — 0.5. The true parameter for this institution is A = 0.6, i.e., it does worse than 
the average institution. However, our expert has an even worse opinion of his institution, 
namely •& = 0.7. 

We compare the pure maximum likeHhood estimator Ajf = \ ^^i^ the Bayesian 

estimator 

ASw = E[A|iVi,...,iVfc], (3.13) 

proposed in Shevchenko and Wiithrich [25 (without expert opinion) with the Bayesian 
estimator derived in formula l|3.1ip . including expert opinion: 

A;, =E[A|iVi,...,iVfe,79]. (3.14) 

The results are plotted in Figure [H The estimator l|3.1ip shows a much more stable 
behavior around the true value A = 0.6, due to the use of the prior information (market 
data) and the expert opinions. Given adequate expert opinions, the Bayesian estimator 
l|3.1ip clearly outperforms the other estimators, particularly if only a few data points are 
available. 

One could think that this is only the case when the experts' estimates are appropriate. 
However, even if experts fairly under- (or over-) estimate the true parameter A, the method 
presented in this paper performs better for our dataset than the other mentioned methods, 
when a few data are available. In Figure [2]we display the same estimators, but where the 
experts' opinion is = 0.4, which clearly underestimates the true expected value 0.6. 
In Figured] Afe gives better estimates when compared to A|^. Observe that also in Figure 
[2]Afc gives more appropriate estimates than A|^. Though the expert is too optimistic, A^ 
manages to correct A^^^^^ {k < 10), which is clearly too low. □ 
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Figure 1: The Bayes estimator (Afc)fc=o,...,i5 includes internal data simulated 
from Poisson(0.6), external data A with E[A] = 0.5 and expert opinion d = 0.7 
(o). It is compared with the Bayes estimator Xf^ proposed in Shevchenko and 
Wiithrich [21] (A) and the classical maximum likelihood estimator (+). 



This example yields a typical picture observed in numerical experiments that demonstrates 
that the Bayes estimator l|3.1ip is often more suitable and stable than maximum Hkelihood 
estimators based on internal data only. 

Remark 3.8 Note that in this example the prior distribution as well as the expert 
opinion do not change over time. However, as soon as new information is available or 
when new risk management tools are in place, the corresponding parameters may be 
easily adjusted. □ 
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Figure 2: The same estimators as in Figure [T] are displayed, but where the 
expert underestimates the true A = 0.6 by z9 = 0.4. 



3.3 Alternative estimator using the mode 

Instead of calculating the mean of the GIG(i^, w, 0) as we did in the estimator (|3.1ip . 
we could use the mode of the distribution, i.e., the point where the density function is 
maximum. The mode of a GIG differs only slightly from the expected value for large \v\. 
In particular, one proves, e.g., that for X ~ GIG(z^, w, (p) we have 

mode(X) - E[X] for ly ^ oo. (3.15) 

The mode of a GIG(i^, uj, 4>) is easily calculated by 

OX 

Hence, 

mode(X) = -^{v + ^Jv^^ + Aujcf)), (3.17) 
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which gives us a good approximation to the mean for large v. Thus, we have 



mode(A,|7Vi, -di) = ^ (^1 + sign(z/)y 1 + ^ j , (3.18) 
where v, ui, and (j) are given by equations (|3.5p . Due to for Ki oo, Mi — > cx) 



Mi or ^ ^ 0, we approximate \/T+2x ~ 1 + x, x 0, and hence 

mode(A,|iVi,i9i)« ;J^l{,>o} + ^. (3.19) 

With (|3.19p we again get the results from Theorem 13.61 in an elementary manner avoiding 
Bessel functions. 



4 Loss Severities 

In the previous section we presented a method to quantify the operational risk loss fre- 
quency. We now turn to quantification of the severity distribution for operational risk. 
This is done in this section for different types of subexponential models. 

4.1 Lognormal model (Model 1 for severities) 
Model Assumptions 4.1 (Lognormal-normal-normal) 

Let us assume the following severity model for operational risk of a risk cell in bank i, 
l<i<I: 

a) Let Ai ^ J\f{fio,aQ) be a normally distributed random variable with parameters 
Ho, (Jo, which are estimated from (external) market data, i.e., in (|2.5p is the 
density of A/'(/xo, ctq). 

b) The losses k — l,...,Ki from institution i are assumed to be conditionally (on 
Ai) i.i.d. lognormally distributed: Xi.i, . . . , Xijf. | ''^^ LN(Ai,cri), where <Ti is 
assumed known. That is, /i(-|Ai) in (|2.5p corresponds to the density of a LN(Ai, tXi) 
distribution. 

c) We assume that bank i has Mi experts with opinions I < m < Mi, about 
the parameter A^ with 'i?|™-*|Ai Af{fJ.i — Ai,ai — ^i), where S,i is a parameter 
estimated using expert opinion data. That is, /2(-|Ai) corresponds to the density 
of a 7V(A,;, ^i) distribution. □ 
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Remarks 4.2: 

• For Mi > 2, the parameter is, e.g., estimated by the standard deviation of i?!'"-': 



\ m— 1 



1/2 



(4.1) 



• The hyper-parameters fio and ctq are estimated from market data, e.g., by maximum 
Ukelihood estimation or by the method of moments. 

• In practice one often uses an ad hoc estimate for cr,;, 1 < i < I, which usually is 
based on expert opinion only. However one could think of a Bayesian approach for 
ai , but then an analytical formula for the posterior distribution in general does not 
exist. The posterior distribution needs then to be calculated for example by the 
Markov Chain Monte Carlo method; see again Peters and Sisson [23j or Gilks et 
al. [171. □ 



Under Model Assumption 14.11 the posterior density is given by 



TTA, ((^i|Xj,1?i) CX 



with 



and 



1 



(Tov 27r 



exp 



(-5, - mo)' 



2^2 



n _ ^, — cxp 



m—1 



2a? 



(TiV 27r 



exp 



(logXj.fc - Si 
2a? 



c!c exp 



2a2 



M. 



(m) 



(x exp 



{S. - m)' 



2^2 



1 K, 



Si 



In summary we have the following theorem. 



Theorem 4.3 



(4.2) 
(4.3) 
(4.4) 



Under Model Assumptions \4-l\ and with the notation logX^ — log-'^i.fc, the pos- 

terior distribution of Ai, given loss information Xi and expert opinion is a normal 
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distribution Af^jl, a) with 



( 



1 



) 



-1 



.2 





+ 



,2 



+ 



(4.5) 



and 



(4.6) 



T/ie credibility weights are uji — a"^ /cfq, uj2 — a^Ki/af and uj^ = a^Mi/^?. 

This theorem yields a natural interpretation of the considered model. The estimator in 
l|4.6p weights the internal and external data as well as the expert opinion in an appropriate 
manner. Observe that under Model Assumptions 14. II we can explicitly calculate the mean 
of the posterior distribution. This is different from the frequency model in Section [31 
That is, we have an exact calculation and for the interpretation of the terms we do not 
rely on an asymptotic theorem as in Theorem l3.6l However, interpretation of the terms is 
exactly the same as in Theorem 13.61 The more credible the information, the higher is the 
credibility weight in l|4.6p . Hence, again, this theorem shows that our model is appropriate 
for combining internal observations, relevant external data and expert opinions. 

4.2 Pareto model (Model 2 for severities) 

Model Assumptions 4.4 (Pareto-Gamma-Gamma) 

Let us assume the following severity model for a particular operational risk cell of bank 
z, 1 < j < /: 

a) Let Ti ~ T{aQ, /?o) be a Gamma distributed random variable with parameters ao, /Sq, 
which are estimated from (external) market data, i.e., n{-y) in (|2.5p is the density 
of a r(Q!o, /3o) distribution. 

b) The losses k — 1, . . . ,Ki from institution i are assumed to be conditionally (on Ti) 
i.i.d. Pareto distributed: i, . . . ,Xi^Ki\^i ' ~' Pareto(ri, L^), where the threshold 
Li > is assumed to be known and fixed. That is, /i(-|ri) in l|2.5p corresponds to 
the density of a Pareto(ri, Li) distribution. 

c) We assume that bank i has Mi experts with opinions 1 < "i < M^, about 
the parameter Ti with 'd["^''\Ti ''^^ T{ai — ^i,Pi = Ti/^i), where is a parameter 
estimated using expert opinion data; see l|3.2p . That is, /2(-|ri) corresponds to the 
density of a T{^i,ri/^i) distribution. □ 
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Under Model Assumptions 14.41 the posterior density is given by 



ft 2i /^^V^"^'^ ft (^!"'Va)°'-\ -.(---)/,. 



= 1 ^ ' m=l 



oc 7j exp 



(4.7) 



Hence, again, the posterior distribution is a GIG and it has the nice property that the 
term 7^ in the exponent in (|4.7p is only affected by the internal observations, whereas the 
term 1/7^ is driven by the expert opinions. 

Theorem 4.5 

Under Model Assumptions \4.4\ the posterior density ofVi, given loss information Xi and 
expert opinion "di, is given by 

with 

V = ao - 1 - MiS^i + Ki, 

- - 7^+El°gT^' (4-9) 

It seems natural to generalize this result by substituting the prior Gamma distribution 
by a GIG as follows. 

Model Assumptions 4.6 (Pareto-Gamma-GIG) 

Let us assume the following severity model for a particular operational risk cell of bank 
i,l<i<I: 

a) Let Ti ~ GIG(i^o, ^0, </'o) be a generalized inverse Gaussian distributed random 
variable with parameters vq, ujq, 4>q, which are estimated from (external) market 
data, i.e., 7r(7) in l|2.5p is the density of a G\G{va,ujQ,4'Q) distribution. 

b) The losses k = l,...,Ki from bank i are assumed to be conditionally (on Ti) 
i.i.d. Pareto distributed: Xi^i, . . . ,Xi^Ki\Ti Pareto(ri, Li), where the threshold 
Li > is assumed to be known and fixed. That is, /i(-|ri) in l|2.5p corresponds to 
the density of a Pareto(ri, Li) distribution. 
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c) We assume that bank i has Mi experts with opinions I < m < Mi, about 

the parameter Ti with 'd'f^^lTi T{ai — £.i,l3i = Ti/^i), where ^i is a parameter 
estimated using expert opinion data. That is, /2(-|ri) corresponds to the density of 
a r(fi, Fi/^i) distribution. □ 



Under Model Assumptions 14. 6[ the a posteriori density Trr^ (7i|Xi, i^i) is given by l|4.8p 
with 

ly = uo - Mi^i + Ki, 

Lj = ^o + ^log-^, (4.10) 

k=i 

</) = 00 + ^iMA. 

Hence, again, the posterior distribution is given by a GIG. Note that for (fiQ = 0, the GIG 
is a Gamma distribution and hence we are in the Pareto-Gamma-Gamma situation of 
Model 1131 

The following theorem gives us a natural interpretation of the Bayesian estimator 



E[r,|Xi,i9i] = xr-R,+i{2^LU(j)). (4.11) 
V uJ 

Denote the maximum likelihood estimator of the Pareto tail index Ti by 

7f "^"^ = K^' X ■ (4-12) 
Then, completely analogous to Theorem 13.61 we obtain the following theorem. 
Theorem 4.7 

Under Model Assumptions \4-4\ and \4-6[ the following asymptotic relations hold ^-almost 
surely: 

a) Assume, given ^ 7^, Xi^k Pareto(7i, L^) and i?,-"-' T {^i , ji / ^i) . 
For K,-^(x: E[r,\Xi, ^ ^X,^k\r^ = J^]/V^ = 7,. 

b) For Vco(i?|"^|r,) ^ : E[r,|Xi,i9i] ^ m = 1, . . . ,M,. 

c) Assume, given = 7^, Xi^k Pareto(7i, Lj) and T{^i,ji/£,i). 
For M, ^ 00 : E[r,|Xi, ^ E[i9|™^|r, - 7,] - 7,. 

d) For Vco(i?|'"^|r,) 00, m = 1, . . . , M, : 

E[r,|X,,,?,] ^ (1 - ,»^4kp.) ^P^] + ,M.f'+%ft, 7f"^- 

20 



e) For E[r,] = constant and \co{Ti) : E[r,|Xi,i?i] E[r,]. 
Remarks 4.8: 

• Theorem 14.71 basically says that the higher the precision of a particular source of 
risk information, the higher its corresponding credibility weight. This means that 
we obtain the same interpretations as for Theorem 13.61 and Formula (|4.6p . 

• Observe that in Section [3] and Section 14.11 we have applied Bayesian inference to 
the expected values of the Poisson and the normal distribution, respectively. How- 
ever, Bayesian inference is much more general, and basically, can be applied to any 
reasonable parameter. In this Section l4?2l it is, e.g., applied to the Pareto tail index. 

• Observe that Model Assumptions 14. 41 and l4.6l lead to an infinite mean model because 
the Pareto parameter Ti can be less than one with positive probability. For finite 
mean models, the range of possible Ti has to be restricted to > 1. This does not 
impose difficulties; for more details we refer the reader to Shevchenko and Wiithrich 
[24) . Section 3.4. □ 



4.3 Implementation and practical application 

Note that the update process of (|4.9p and l|4.1Qp has again a simple linear form when new 
information arrives. The posterior density l|4.8p does not change its type every time a new 
observation arrives. In particular, only the parameter oj is affected by a new observation. 



Information update process. Loss k — > loss fc + 1: 

Vk+l = Vk + ^, 



uJk+i = ujk + ^og^-Y^, (4.13) 

= (pk- 



The following example shows the simplicity and robustness of the estimator developed. 

Example 4.9 Assume that a bank would like to model its risk severity by a Pareto 
distribution with tail index F. The regulator provides external prior data, saying that 
F ~ F(ao,/3o) with ao = 4 and f3o = 9/8, i.e., E[F] = 4.5 and Vco(F) = 0.5. The bank 
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has one expert opinion 1) — 3.5 with Vco(i9|r) — 0.5, i.e., ^ = 4. We then observe the 
following losses (sampled from a Pareto(Q! = 4, L = 1) distribution); see also Figure [3l 
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Figure 3: 15 loss severities sampled from a Pareto(a = 4, L = 1) distribution. 



In Figured] we compare the Bayes estimator 



7fc = E[r|Xi,...,Xfc,7?], 



(4.14) 



given by (|4.1ip with the estimator proposed in Shevchenko and Wiithrich [21] without 
expert opinions 

7i^ = E[r|Xi,...,Xfc], (4.15) 
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Loss index 



Figure 4: The Bayes estimator including expert opinions (o) is compared with 
the Bayes estimator without expert opinions (A) and with the maximum Hke- 
Hhood estimator (+). 



and the classical maximum likelihood estimator 

= ' X. - (4-16) 

E^=l log -T 

Figured] shows the high volatility of the maximum likelihood estimator, for small numbers 
k. It is very sensitive to newly arriving losses. However, the estimator proposed in this 
paper shows a much more stable behavior around the true value a = 4, most notably 
when a few data points are available. □ 

This example also shows that when modeling severities of operational risk, Bayesian infer- 
ence is a suitable method to combine different sources of information. The consideration 
of relevant external data and well-specified expert opinions stabilizes and smoothens the 
estimator in an appropriate way. 
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5 Total loss distribution and risk capital estimates 



In the preceding sections we have described how the parameters of the distributions are 
estimated. According to the Basel II requirements (see BIS [6]) the final bank capital 
should be calculated as a sum of the risk measures in the risk cells if the bank's model 
cannot account for correlations between risks accurately. If this is the case, then one needs 
to calculate VaR for each risk cell separately and sum VaRs over risk cells to estimate 
the total bank capital. Adding quantiles over the risk cells to find the quantile of the 
total loss distribution is sometimes too conservative. It is equivalent to the assumption 
of perfect dependence between risks. 

The calculation of VaR (taking into account parameter uncertainty) for each risk cell can, 
in view of the previous sections, easily be done using a simulation approach described 
in Shevchenko and Wiithrich |[24j. Section 6. Simulation procedures for independent risk 
cells and in the case of dependence between risks are also described in Shevchenko and 
Wiithrich [21] and thus we refrain from commenting further on this issue. 
However, reasonable aggregation is still an open challenging problem that needs further 
investigation. The choice of appropriate dependence structures is crucial and determines 
the amount of diversification. In the general case, when no information about the depen- 
dence structure is available, Embrechts and Puccetti flS] work out bounds for aggregated 
operational risk capital; for further issues regarding aggregation we would like to refer to 
Embrechts et al. |[i4|. 

6 Conclusion 

In this paper we propose a novel approach that allows for combining three data sources: 
internal data, external data and expert opinions. The approach is based on the Bayesian 
inference method. It is applied to the quantification of the frequency and severity distri- 
butions in operational risk, where there is a strong need for such a method to meet the 
Basel II regulatory requirements. 

The method is based on specifying prior distributions for the parameters of the frequency 
and severity distributions using industry data. Then, the prior distributions are weighted 
by the actual observations and expert opinions from the bank to estimate the posterior 
distributions of the model parameters. These are used to estimate the annual loss distri- 
bution for the next reporting year. Estimation of low frequency risks using this method 
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has several appealing features such as: stable estimators, simple calculations (in the case 
of conjugate priors), and the ability to take expert opinions and industry data into ac- 
count. This method also allows for calculation of VaR with parameter uncertainty taken 
into account. 

For convenience we have assumed that expert opinions are i.i.d. but all formulas can easily 
be generalized to the case of expert opinions modeled by different distributions. 
It would be ideal if the industry risk profiles (prior distributions for frequency and severity 
parameters in risk cells) are calculated and provided by the regulators to ensure consis- 
tency across the banks. Unfortunately this may not be realistic at the moment. Banks 
might thus estimate the industry risk profiles using industry data available through ex- 
ternal databases from vendors and consortia of banks. The data quality, reporting and 
survival biases in external databases are the issues that should be considered in practice 
but go beyond the purposes of this paper. 

The approach described is not too complicated and is well suited for operational risk 
quantification. It has a simple structure, which is beneficial for practical use and can 
engage the bank risk managers, statisticians and regulators in productive model develop- 
ment and risk assessment. The model provides a framework that can be developed further 
by considering other distribution types, dependencies between risks and dependence on 
time. 

One of the features of the described method is that the variance of the posterior distri- 
bution 7f(7|-) will converge to zero for a large number of observations. That is, the true 
values of the risk parameters will be known exactly. However, there are many factors (for 
example, political, economical, legal, etc.) changing in time that will not permit for the 
precise knowledge of the risk parameters. One can model this by Hmiting the variance 
of the posterior distribution by some lower levels (say, e.g., 5%). This has been done in 
many solvency approaches for the insurance industry; see, e.g., the Swiss Solvency Test, 
FOPI [IS], formulas (25)- (26). 

Although the main impetus motivation for the present paper is an urgent need from op- 
erational risk practitioners, the proposed method is also useful in other areas (such as 
credit risk, insurance, environmental risk, ecology etc.) where, mainly due to lack of inter- 
nal observations, a combination of internal data with external data and expert opinions 
is required. 
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A Generating realizations from a GIG random variable 

For practical purposes it is required to generate realizations of a random variable X ^ 
GIG(a;, 0, y) with w, > 0. Observe that we need to construct a special algorithm since 
we can not invert the distribution function analytically. The following algorithm can be 
found in Dagpunar |8]; see also McNeil et al. [19] : 

Algorithm A.l (Generalized inverse Gaussian) 



1. a = \/u)/4)\ (3 = 2^/uj(j), 
TO = i (i/ + v/i^^T^) , 

g{y) = \Py^ - y^{\Pm + v + 2)+ y{vm - I) + \l3m. 

2. Set 2/0 = rn, 

While g(yo) < do yo = 2yo, 
y+: root oi g in the interval (ra^yo), 
root of g in the interval (0, m). 

3. a = (y+ - m) {^Y" exp (-f + - to - i)) , 
b = (y_ - m) {y^y^ exp + ^ - to - i,)) , 
c=-|(m+^) + |log(TO). 

4. Repeat t/, 1/ - Unif(0, 1), y = to + + b^, 
until r > and -logt/ > -| logF + i/3(r + ^) + c, 
Then X — ^ \s GIG(a^, j^); see Dagpunar [8J. 

To generate a sequence of n realizations from a GIG random variable, step 4 is repeated 
n times. □ 

B Asymptotic results for modified Bessel functions 

Let K-y{z) denote the modified Bessel function of the third kind as defined in l|3.6p . 



Lemma B.l With notation ^3.10]) . we have the following asymptotic relation for v 
CO, for all a,b > 0; 

RUaV^) - (B.l) 
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Proof: From Abramowitz and Stegun [T], Paragraph 9.7.8 and Olver [21], Chapter 4, 
we may deduce for large v and z > 



TT exp(— i/VT+z^) 



2v (i + z2)i/4 Vi + VTT^ 

where the error term eiy^ z) is bounded by 



u'i{s)\ds, 



(B.2) 



(B.3) 



with t'o = + 1^ and ui(s) = (3s — 5s^)/24; see Abramowitz and Stegun [T] for details. 
In l|B.2p we replace u by bv and z by zi = aj The error term e(hv, a/^/y) in l|B.2p is 
then vanishing for oo, because the right-hand side of (|B.3P tends to 0. Analogously, 
we replace vhybv + 1 and z by Z2 = a^/v/{hv + l) and observe that e{hv-\-l, / ibv + l)) 
tends to 0. Thus, l|B.2[) gives us asymptotic expressions for Ki,^{a^) and Kb^+i{a^Jv). 
Straightforward calculations then yield 

2hy^ 



This completes the proof. 



Ki,v+i{a^/v) ^ 2_ 
Kbuia^/u) Z2 



(B.4) 



□ 



C Proof of Theorem 13.6 



Proof: With l|3.1ip the proof of this theorem is straightforward, using Lemma FB. II in 
Appendix [BJ The following statements hold P-almost surely. 



b,c) .f^R,+i{2V^) ^ .f^R^M,iA2V^J^^MM) 



d) If ^ = 0, we are in the Gamma case r{a,(3) with a = ao + KiNi and /3 
/3o/(^.if./3o + l). Hence, 



lE[A, 



1 - 



1 



e) yji?,+i(2V^)^y^i?„„(2y|g:)~E[A,] 



□ 
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